03/10/2019
import numpy as np
data = np.loadtxt('olympic100m.txt', delimiter=',') # load olympic data
x = # make x a column vector
t = # make t a column vector
It's useful to start with a plot
%matplotlib inline
import pylab as plt
# plot x (x-axis) vs t (y-axis) with matplotlib's plot function: plt.plot
Let's plot the loss function in 1D and 2D ($w_0$ and $w_1$)
Recall that the average squared loss was given by:
$$ L = \frac{1}{N}\sum_{n=1}^N (t_n - w_0 - w_1x_n)^2 $$$L$ is a function of $w_0$ and $w_1$. All $x_n$ and $t_n$ are given.
num_candidates = # number of candidates, e.g. 100
w0_candidates = # generate a numpy array of possible w0 values e.g. -10 to 82 with np.linspace
w1_candidates = # generate a numpy array of possible w1 values e.g. -0.037 to 0.01
The average loss $L$ has both $w_0$ and $w_1$ as variables. In order to plot $L$, we have to fix one variable. For example, to plot $L$ vs $w_1$, we fix $w_0 = 36.4164559025$. See how the plot change when you change w0.
L = # preallocating a vector for L e.g. np.zeros with num_candidates
for j in range(num_candidates): # For loop to evaluate L at every w1_candidates with
# w0 = 36.4164559025
L[j] = # Wrtie code to compute the average squared loss.
# The "np.mean" function could be helpful
# Make sure only w1_candidates is changing while w0 is fixed at 36.4164559025
# plot w1_candidates (x-axis) vs L (y-axis) here
plt.plot( , , 'ro') # also plot the point w1 = -0.013330885711, L = 0.05
plt.xlabel('$w_1$')
plt.ylabel('$L$')
Let's plot the average squared loss in $w_0$, this time fix $w_1 = -0.013330885711$.
L = # preallocating a vector for L e.g. np.zeros again with num_candidates
for j in range(num_candidates): # For loop to evaluate L at every w0_candidates
# with w1 = −0.013330885711
L[j] = # Wrtie code to compute the average squared loss.
# The "np.mean" function could be helpful
# Make sure only w0_candidates is changing while w1 is fixed at −0.013330885711
# plot w0_candidates (x-axis) vs L (y-axis) here
plt.plot( , , 'ro') # also plot the point w0 = 36.4164559025, L = 0.05
plt.xlabel('$w_0$')
plt.ylabel('$L$')
Now, we let $w_0$ and $w_1$ change at the same time, and make the contour plot
L = # Prelocate the loss. We are going to have num_candidates times num_candidates of them.
#You can use np.zeros again, this time L has to be a num_candidates by num_candidates matrix.
# Two nested for loops
for i in range(num_candidates): # changing index of w0_candidates
for j in range(num_candidates): # changing index of w1_candidates
L[i,j] = # Wrtie code to compute the average squared loss.
# Can you compute the average squared loss without np.mean ?
# This time make sure both w0_candidates and w1_candidates are both changing
X, Y = np.meshgrid(w0_candidates, w1_candidates) # Make the x and y coordinates for contour plot
plt.contour(X, Y, L, 50) # change the number 50 to see what happens
plt.plot(, ,'ro') # plot the point w0= 36.4164559025, w1=-0.013330885711
plt.xlabel('$w_0$')
plt.ylabel('$w_1$')
Repeat until convergence { $$ w_i = w_i - \alpha \frac{\partial L(w_0, w_1)}{\partial w_i} $$ }
It will be easier if we enclose the code to compute the gradient and the loss function in functions
def loss(x, t, w0, w1): # define the loss function
L = # the average squared loss function
return(L)
def gradient(x, t, w0, w1): # define the gradient function
g0 = # partial derivative with respect to w0. This should be just one number.
g1 = # partial derivative with respect to w1. This should be just one number.
g = np.array([g0, g1])
return(g)
Our Olympic data turns out to be a quite challenging dataset for gradient descent. Try change value of alpha and precision in the following cell to see what happens.
alpha = 2e-7 # learning rate
precision = 1e-6 # convergence criterion
w_old = np.zeros(2) # Intial old guess
w_new = np.array([36, 0.1]) # Atural starting point
w_list, l_list = [w_old], [loss(x, t, w_new[0], w_new[1] )] # two lists to store w0, w1 and loss
rate_modifier = np.array([1e6, 1]) # modified rate due the difference in scale between w0 and w1
while sum(abs(w_new - w_old)) > precision: # check convergence
w_old = w_new # update parameters
g = gradient(x, t, w_old[0], w_old[1]) # compute gradient at w_old
w_new = w_old - alpha*rate_modifier * g # update parameters
w_list.append(w_new) # store w
l_list.append(loss(x, t, w_new[0], w_new[1])) # store loss
print "Minimum loss occurs at: ", w_new
print "Minimum loss is:", float(loss(x, t, w_new[0], w_new[1]))
print "Gradient:", gradient(x, t, w_new[0], w_new[1])
print "Number of steps:", len(l_list)
x_test = # generate new x to plot the fitted line. Note better not to use the original x !
f_test = # compute the corresponding prediction by the fitted model
# plot the fitted line
# plot data
plt.xlabel('Olympic year')
plt.ylabel('Winning time (s)')
Use the w_list and l_list to visualise the trace of $w_0$ and $w_1$ during gradient descent
w_list = np.asanyarray(w_list) # convert the parameter list to a numpy array
plt.plot(w_list[::2000,0], w_list[::2000,1], "ro--") # only showing a subsample of the points
plt.plot(w_list[0,0], w_list[0,1], "bo") # plot the 1st point
plt.plot(w_list[-1,0], w_list[-1,1], "go") # plot the final point
plt.contour(X, Y, L, 50) # contour again
plt.xlabel('$w_0$')
plt.ylabel('$w_1$')
plt.plot(l_list) # plot l_list
plt.xlabel('iterations')
plt.ylabel('$L$')
plt.plot(l_list) # zoom in
plt.ylim((0,1))
plt.xlabel('iterations')
plt.ylabel('$L$')
Solving $$\frac{\partial L(w_0, w_1)}{\partial w_0} = 0, \quad \frac{\partial L(w_0, w_1)}{\partial w_1} = 0 $$, the average loss is minimised: $$ w_1 = \frac{\bar{x}\bar{t} - \bar{xt}}{\bar{x}\bar{x} - \bar{x^2}} $$ and $$ w_0 = \bar{t} - w_1\bar{x} $$ where $\bar{z} = \frac{1}{N}\sum_{n=1}^N z_n$.
xbar =
tbar =
xxbar =
xtbar =
print(xbar)
print(tbar)
print(xxbar)
print(xtbar)
w1 =
w0 =
print(w0)
print(w1)
x_test = # generate new x to plot the fitted line. Note better not to use the original x !
f_test = # compute the corresponding target variables
# plot the fitted line
# plot data
plt.xlabel('Olympic year')
plt.ylabel('Winning time (s)')
We can now compute the prediction at 2012:
win_2012_least_square = # make a prediction with the least square solution
win_2012_gradient_descent = # make a prediction with the gradient descent solution
print(win_2012_least_square)
print(win_2012_gradient_descent)
Let's simulate some data using the following model
$$ t_n = w_0 + w_1 x_n + w_2 x_n^2$$x = # generate x
t = # generate the corresponding t with above model
# plot x and t
Now, assume that we didn't know $w_0=1$, $w_1=2.5$, and $w_2 = 3$. We only have the data, x and t
def new_loss(x, t, w0, w1, w2): # define average squared loss function
L =
return(L)
def new_gradient(x, t, w0, w1, w2): # define the gradient
g0 = # partial derivative with respect to w0
g1 = # partial derivative with respect to w1
g2 = # partial derivative with respect to w2
g = np.array([g0, g1, g2])
return(g)
alpha = 1e-2
precision = 1e-6
w_old = np.zeros(3)
w_new = np.array([-3, -0.1, -1])
w_list, l_list = [w_old], [loss(x, t, w_new[0], w_new[1] )]
rate_modifier = np.array([1, 1, 1])
while sum(abs(w_new - w_old)) > precision:
w_old = w_new
g = new_gradient(x, t, w_old[0], w_old[1], w_old[2])
w_new = w_old - alpha*rate_modifier * g
w_list.append(w_new)
l_list.append(new_loss(x, t, w_new[0], w_new[1], w_new[2]))
print "Minimum loss occurs at: ", w_new
print "Minimum loss is:", float(new_loss(x, t, w_new[0], w_new[1], w_new[2]))
print "Gradient:", new_gradient(x, t, w_new[0], w_new[1], w_new[2])
print "Number of steps:", len(l_list)
x_test = # generate new x to plot the fitted line. Note better not to use the original x !
f_test = # compute the corresponding target variables
# plot the fitted line
# plot data